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5 ON THE FLY GENERATION OF MULTIMEDIA CODE FOR 

IMAGE PROCESSING 

Background of the Invention 
10 Field of the Invention 

The invention relates to the processing of multimedia data with processors that 
feature multimedia instruction enhanced instruction sets. More particularly, the 
invention relates to a method and apparatus for generating processor instruction 
15 sequences for image processing routines that use multimedia enhanced 
instructions. 

Description of the Prior Art 

In general, most programs that use image processing routines with multimedia 
20 instructions do not use a general-purpose compiler for these parts of the 
program. These programs typically use assembly routines to process such data. 
A resulting problem is that the assembly routines must be added to the code 
manually. This step requires high technical skill, is time demanding, and is prone 
to introduce errors into the code. 

25 
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5 In addition, different type of processors, (for example, Intel's Pentium I w/MMX 
and Pentium II, Pentium III, Willamette, AMD's K-6 and AMD's K-7 aka. Athlon) 
each use different multimedia command sets. Examples of different multimedia 
command sets are MMX, SSE and 3DNow. Applications that use these 
multimedia command sets must have separate assembly routines that are 

10 specifically written for each processor type. 

At runtime, the applications select the proper assembly routines based on the 
processor detected. To reduce the workload and increase the robustness of the 
code, these assembly routines are sometimes generated by a routine specific 
15 source code generator during program development. 

One problem with this type of programming is that the applications must have 
redundant assembly routines which can process the same multimedia data, but 
which are written for the different types of processors. However, only one 

20 assembly routine is actually used at runtime. Because there are many 
generations of processors in existence, the size of applications that use 
multimedia instructions must grow to be compatible with all of these processors. 
In addition, as new processors are developed, all new routines must be coded for 
these applications so that they are compatible with the new processors. An 

25 application that is released prior to the release of a processor is incompatible 
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5 with the processor unless it is first patched/rebuilt with the new assembly 
routines. 

It would be desirable to provide programs that use multimedia instructions which 
are smaller in size. It would be desirable to provide an approach that adapts such 
10 programs to future processors more easily 

Summary of the Invention 

In accordance with the invention, a method and apparatus for generating 
assembly routines for multimedia instruction enhanced data is shown and 
15 described. 

An example of multimedia data that can be processed by multimedia instructions 
are the pixel blocks used in image processing. Most image processing routines 
operate on rectangular blocks" of evenly sized data pieces {e.g. 16x16 pixel 

20 blocks of 8 bit video during MPEG motion compensation). The image processing 
code is described as a set of source blocks, destination blocks and data 
manipulations. Each block has a start address, a pitch (distance in bytes 
between two consecutive lines) and a data format. The full processing code 
includes width and height as additional parameters. All of these parameters can 

25 either be integer constants or arguments to the generated routine. All data 
operations are described on SIMD data types. A SIMD data type is a basic data 

3 
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5 type {e.g. signed byte, signed word, or unsigned byte) and a number or repeats 
{e.g. 16 pixels for MPEG Macroblocks). The size of a block (source or 
destination) is always the size of its SIMD data type times its width in horizontal 
direction and the height in vertical direction. 

10 In the presently preferred embodiment of the invention, an abstract image 
generator inside the application program produces an abstract routine 
representation of the code that operates on the multimedia data using SIMD 
operations. A directed acyclic graph is a typical example of a generic version. A 
translator then generates processor specific assembly code from the abstract 

15 respresentation. 

Brief Description of the Drawings 

FIG. 1 is a block diagram of a computer system that may be used to implement a 
20 method and apparatus embodying the invention for translating a multimedia 
routine from its abstract representation generated by an abstract routine 
generator inside the application's startup code into executable code using the 
code generator. 

25 
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5 Description of the Preferred Embodiment 

In Fig.1 the startup code 11 of the application program 13, further referred to as 
the abstract routine generator, generates an abstract representation 15 of the 
multimedia routine represented by a data flow graph. This graph is then 

to translated by the code generator 17 into a machine specific sequence of 
instructions 19, typically including several SIMD multimedia instructions. The 
types of operations that can be present inside the data flow graph include add, 
sub, multiply, average, maximum, minimum, compare, and, or, xor, pack, unpack 
and merge operations. This list is not exhaustive as there are operations 

15 currently performed by MMX, SSE and 3DNow for example, which are not listed. 
If a specific command set does not support one of these operations, the CPU 
specific part of the code generator replaces it by a sequence of simpler 
instructions (e.g. the maximum instruction can be replaced by a pair of subtract 
and add instruction using saturation arithmetic). 

20 

The abstract routine generator generates an abstract representation of the code, 
commonly in the form of a directed acyclic graph during runtime. This allows the 
creation of multiple similar routines using a loop inside the image processing 
code 21 for linear arrays, or to generate routines on the fly depending on user 
25 interaction. E.g. the bi-directional MPEG 2 motion compensation can be 
implemented using a set of sixty-four different but very similar routines, that can 
be generated by a loop in the abstract image generator. Or an interactive paint 
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5 program can generate filters or pens in the form of abstract representations 
based on user input, and can use the routine generator to create efficient code 
sequences to perform the filtering or drawing operation. Examples of the data 
types processed by the code sequences include: SIMD input data, image input 
data and audio input data. 

10 

Examples of information provided by the graphs include the source blocks, the 
target blocks, the change in the block, color, stride, change in stride, display 
block, and spatial filtering. 

15 The accuracy of the operation inside the graphs can be tailored to meet the 
requirements of the program. The abstract routine generator can increase its 
precision by increasing the level of arithmetics per pixel. For example, 7-bit 
processing can be stepped up to 8-bit, or 8-bit to 16-bit. Eg. motion 
compensation routines with different types of rounding precision can be 

20 generated by the abstract routine generator. 

The abstract representation, in this case the graph 15, is then sent to the 
translator 17 where it is translated into optimized assembly code 19. The 
translator uses standard compiler techniques to translate the generic graph 
25 structure into a specific sequence of assembly instructions. As the description is 
very generic, there is no link to a specific processor architecture, and because it 
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5 is very simple it can be processed without requiring complex compiler 

techniques. This enables the translation to be executed during program startup 
without causing a significant delay. Also, the abstract generator and the 
translator do not have to be programmed in assembly. The CPU specific 
translator may reside in a dynamic link library and can therefore be replaced if 

10 the system processor is changed. This enables programs to use the multimedia 
instructions of a new processor, without the need to be changed. 



Tables A-C provide sample code that generates an abstract representation for a 
motion compensation code that can be translated to an executable code 
15 sequence using the invention. 



TABLE A 



#ifndef MPEG2MOTIONCOMPENSATION_H 
20 #define MPEG2MOTIONCOMPENSATION_H 

#include "driver\sof twarecinemaster\common\prelude . h" 
#include " . . \ . . \BlockVideoProcessor \BVPXMMXCodeConverter . h" 

25 // 

// Basic block motion compensation functions 
// 

class MPEG2MotionCompensation 
{ 

30 protected: 

// 

// Function prototype for a unidirectional motion compensation 

routine 

// 

35 typedef void ( stdcall * CompensationCodeType ) (BYTE * sourcelBase, 

int sourceStride, 

BYTE * targetBase, short * deltaBase, 

int deltaStride, 

int num) ; 
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5 

// 

// Function prototype for a bidirectional motion compensation 

routine 

// 

10 typedef void ( stdcall * BiCompensationCodeType) (BYTE * 

sourcelBase, BYTE * source2Base, int sourceStride, 

BYTE * targetBase, short * deltaBase, 

int deltaStride, 

int num) ; 

15 



// 

// Motion compensation routines for unidirectional prediction. 
Each routine 

20 // handles one case. The indices are 

// - y-uv : if it is luma data the index is 0 otherwise 1 
// - delta : error correction data is present (eg. the block 
is not skipped) 

// - halfy : half pel prediction is to be performed in 
25 vertical direction 

// - halfx : half pel prediction is to be performed in 
horizontal direction 
// 

CompensationCodeType compensation [2 ] [2] [2] [2]; // 

30 y-uv delta halfy halfx 

BVPCodeBlock * compensationBlock [ 2 ] [2] [2] [2]; 



// 

// Motion compensation routines for bidirectional prediction. 
35 Each routine 

// handles one case. The indices contain the same parameters as 

in the 

// unidirectional case, plus the half pel selectors for the 
second source 
40 // 

BiCompensationCodeType bicompensation [ 2 ] [2] [2] [2] [2] [2] ; // 

y-uv delta halfly halflx half2y half2x 

BVPCodeBlock * bicompensationBlock [ 2 ] [2] [2] [2] [2] [2] ; 

public : 
45 // 

// Perform a unidirectional compensation 

// 

void MotionCompensation (BYTE * sourcep, int stride, BYTE * destp, 
short * deltap, int dstride, int num, bool uv, bool delta, int halfx, 
50 int halfy) 
{ 

compensation [uv] [delta] [halfy] [halfx] (sourcep, stride, destp, 
deltap, dstride, num) ; 
} 

55 

// 

// Perform bidirectional compensation 
// 

void BiMot ionCompensat ion (BYTE * sourcelp, BYTE * source2p, int 
60 stride, BYTE * destp, short * deltap, int dstride, int num, bool uv, 
bool delta, int halflx, int halfly, int half2x, int half2y) 
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10 



{ 

bicompensation[uv] [delta] [halfly] [halflx] [half2y] [half2x] (sourcelp, 
source2p, stride, destp, deltap, dstride, num) ; 
} 

MPEG2MotionCompensation (void) ; 
~MPEG2MotionCompensation (void) ; 

}; 



15 #endif 



TABLE B 



20 



#include M MPEG2MotionCompensation . h" 

#include " . . \ . . \BlockVideoProcessor\BVPXMMXCodeConverter . h fl 



// 

// Create the dataflow to fetch a data element from a source block, 
25 II with or without half pel compensation in horizontal and/or 

// vertical direction. 
// 

BVPDataSourcelnstruction * BuildBlockMerge (BVPSourceBlock * 
sourcelBlockA, 

30 BVPSourceBlock * sourcelBlockB, 

BVPSourceBlock * sourcelBlockC, 
BVPSourceBlock * sourcelBlockD, 
int halfx, int halfy) 

{ 

35 if (halfy) 

{ 

if (halfx) 

{ 

// 

40 // Half pel prediction in h and v direction, the graph part 

looks like this 
// 

// .--(LOAD sourcelBlockA) 

// / 
45 // .— (AVG) 

// / \ 

// / ' -- (LOAD sourcelBlockB) 

// < — (AVG) 

// \ .--(LOAD sourcelBlockC) 

50 // \ / 

// " - - (AVG) 

// \ 

// ^--(LOAD sourcelBlockD) 

// 

55 return new BVPDataOperat ion 
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5 BVPDO_AVG, 

new BVPDataOperation 

( 

BVPDO_AVG, 

new BVPDataLoad (sourcelBlockA) , 
10 new BVPDataLoad (sourcelBlockB) 

), 

new BVPDataOperation 
( 

BVPDO_AVG, 

15 new BVPDataLoad ( sourcelBlockC) , 

new BVPDataLoad (sourcelBlockD) 
) 

) ; 

} 

20 else 
{ 

// 

// Half pel prediction in vertical direction 
// 

25 // . — (LOAD sourcelBlockA) 

// / 
// <— (AVG) 
// \ 

// * — {LOAD sourcelBlockC) 

30 // 

return new BVPDataOperation 

( 

BVPDO_AVG, 

new BVPDataLoad (sourcelBlockA) , 
35 new BVPDataLoad (sourcelBlockC) 

) ; 

} 

} 

else 
40 { 

if (halfx) 

{ 

// 

// Half pel prediction in horizontal direction 

45 // 

// . — (LOAD sourcelBlockA) 

// / 
// <— (AVG) 

// \ 
50 // v --(LOAD sourcelBlockB) 

// 

return new BVPDataOperation 
( 

BVPDO^AVG, 

55 new BVPDataLoad ( sourcelBlockA) , 

new BVPDataLoad (sourcelBlockB) 

) ; 

} 

else 

60 { 

// 
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5 // Full pel prediction 

// 

// <— (LOAD sourcelBlockA) 
// 

return new BVPDataLoad (sourcelBlockA) ; 
10 } 
} 

} 

MPEG2MotionCompensation: : MPEG2MotionCompensation (void) 
15 { 

int yuv, delta, halfy, halfx, halfly, halflx, half2y, half2x; 
BVPBlockProcessor * bvp; 
BVPCodeBlock * code; 

20 BVPArgument * sourcelBase; 

BVPArgument * source2Base; 

BVPArgument * sourceStride; 

BVPArgument * targetBase; 

BVPArgument * deltaBase; 

25 BVPArgument * deltaStride; 

BVPArgument * height; 

BVPSourceBlock * sourcelBlockA; 

BVPSourceBlock * sourcelBlockB; 
30 BVPSourceBlock * sourcelBlockC; 

BVPSourceBlock * sourcelBlockD; 

BVPSourceBlock * source2BlockA; 

BVPSourceBlock * source2BlockB; 

BVPSourceBlock * source2BlockC; 
35 BVPSourceBlock * source2BlockD; 

BVPSourceBlock * deltaBlock; 
BVPTargetBlock * targetBlock; 

40 BVPDataSourcelnstruction * postMC; 

BVPDataSourcelnstruction * postCorrect; 
BVPDataSourcelnstruction * deltaData; 

// 

45 // Build unidirectional motion compensation routines 

// 

for (yuv = 0; yuv<2; yuv++) 
{ 

for(delta=0; delta<2; delta++) 
50 { 

for(halfy-0; halfy<2; halfy++) 

{ 

for(halfx=0; halfx<2; halfx++) 

{ 

55 bvp = new BVPBlockProcessor () ; 

bvp->AddArgument (height = new BVPArgument ( false) ) 

bvp->AddArgument (deltaStride = new BVPArgument ( false ) ) 

bvp->AddArgument (deltaBase = new BVPArgument (true )) ; 

60 bvp->AddArgument (targetBase = new BVPArgument (true) ) ; 

bvp->AddArgument (sourceStride = new BVPArgument ( false) ) 
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5 bvp->AddArgument (sourcelBase = new BVPArgument ( true) ) ; 

// 

// Width is always sixteen pixels, so one vector of sixteen 
unsigned eight bit elements, 
10 // height may vary, therefore it is an argument 

// 

bvp->SetDimension (1, height) ; 
// 

15 // Four potential source blocks, B is one pel to the right, 

C one down and D right and down 
// 

bvp->AddSourceBlock {sourcelBlockA - new 
BVPSourceBlock (sourcelBase, 
20 sourceStride, BVPDataFormat (BVPDT_U8, 16) , 0x10000) ) ; 

bvp->AddSourceBlock ( sourcelBlockB = new 
BVPSourceBlock (BVPPointer (sourcelBase, 1 + yuv) , 
sourceStride, BVPDataFormat (BVPDT_U8 , 16), 0x10000)); 

bvp->AddSourceBlock (sourcelBlockC = new 
25 BVPSourceBlock (BVPPointer (sourcelBase, sourceStride, 1, 0), 
sourceSrride, BVPDataFormat (BVPDT_U8 , 16), 0x10000)); 

bvp->AddSourceBlock(sourcelBlockD - new 
BVPSourceBlock (BVPPointer (sourcelBase, sourceStride, 1, 1 + yuv), 
sourceStride, BVPDataFormat (BVPDTJJ8 , 16), 0x10000)); 

30 

// 

// If we have error correction data, we need this source 
block as well 
// 

35 if (delta) 

bvp->AddSourceBlock (deltaBlock = new 
BVPSourceBlock (deltaBase, deltaStride, BVPDataFormat (BVPDT_S16, 16), 
0x10000) ) ; 

40 // 

// The target block to write the data into 

// 

bvp->AddTargetBlock (targetBlock = new 
BVPTargetBlock(targetBase, sourceStride, BVPDataFormat (BVPDT_U8 , 16) , 
45 0x10000)); 

// 

// Load a source block base on the half pel settings 
// 

50 bvp->AddInstruction(postMC = BuildBlockMerge ( sourcelBlockA, 

sourcelBlockB, sourcelBlockC, sourcelBlockD, halfx, halfy) ) ; 

if (delta) 
{ 

55 deltaData = new BVPDataLoad (deltaBlock) ; 

if (yuv) 
{ 

// 

60 // It is chroma data and we have error correction data. 

The u and v 
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5 // parts have to be interleaved, therefore we need the 

merge instruction 

// 

// .--(CONV S16)<— postMC 

// / 
10 // < — (CONV U8)<--(ADD) 

// \ .— (SPLIT H)<-. 

// \ / \ 

// s -- (MERGE OE) 

>-- (LOAD delta) 

15 // \ / 

// (SPLIT T)<-¥ 

// 

bvp->AddInst ruction 
( 

20 postCorrect = 

new BVPDataConvert 
( 

BVPDTJJ8, 

new BVPDataOperation 

25 ( 

BVPDO_ADD, 
new BVPDataConvert 
( 

BVPDT_S16, 
30 postMC 

) , 

new BVPDataMerge 
( 

BVPDM_ODDEVEN , 
35 new BVPDataSplit 

( 

BVPDS_HEAD, 

deltaData 

>, 

40 new BVPDataSplit 

( 

BVPDS_TAIL, 

deltaData 

) 



45 



) 



) 



} 

50 else 

{ 

// 

// It is luma data with error correction 
// 

55 // (CONV S16) <— postMC 

// / 
// <--(CONV U8)<--(ADD) 

// \ 

// % — (LOAD delta) 

60 // 

bvp->AddInst ruction 
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5 ( 

postCorrect = 
new BVPDataConvert 
( 

BVPDT_U8, 

10 new BVPDataOperation 

C 

BVPD0_ADD, 
new BVPDataConvert 
( 

15 BVPDT_S16, 

postMC 
) , 

deltaData 
) 

20 ) 



55 



) 



} 



// 

25 // Store into the target block 

// 

// { STORE targetBlock) <— . . . 

// 

bvp->Add!nst ruction 

30 ( 

new BVPDataStore 

( 

targetBlock, 
postCorrect 
35 ) 

) ; 

} 

else 
{ 

40 // 

// No error correction data, so store motion result into 

target block 

// 

// (STORE targetBlock) <— . . . 

45 // 

bvp->AddInst ruction 

( 

new BVPDataStore 

( 

50 targetBlock, 

postMC 
) 

) ; 

1 



BVPXMMXCodeConverter conv; 



// 

// Convert graph into machine language 
60 // 
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5 compensationBlock[yuv] [delta] [halfy] [halfx] = code = 

conv. Convert (bvp) ; 

// 

// Get function entry pointer 

10 // 

compensation [yuv] [delta] [halfy] [halfx] = 
(CompensationCodeType) ( code->GetCodeAddress ( ) ) ; 

// 

15 // delete graph 

// 

delete bvp; 
} 

} 

20 } 

} 

// 

// build motion compensation routines for bidirectional prediction 
25 // 

for (yuv = 0; yuv<2; yuv++) 

{ 

for(delta=0; delta<2; delta++) 
{ 

30 for {half ly=0; halfly<2; halfly++) 

{ 

for (half lx=0; halflx<2; halflx++) 
{ 

for (half2y=0; half2y<2; half2y++) 
35 { 

for (half2x=0; half2x<2; half2x++) 

{ 

bvp = new BVPBlockProcessor ( ) ; 

40 bvp->AddArgument (height = new 

BVPArgurnent (false) ) ; 

bvp->AddArgument (deltas t ride = new 
BVPArgurnent (false) ) ; 

bvp->AddArgument (deltaBase = new 

45 BVPArgurnent (true) ) ; 

bvp->AddArgument (targetBase = new 

BVPArgurnent (true) ) ; 

bvp->AddArgument ( sourceStride = new 
BVPArgurnent (false) ) ; 
50 bvp->AddArgument (source2Base = new 

BVPArgurnent (true) ) ; 

bvp->AddArgument (sourcelBase = new 
BVPArgurnent (true) ) ; 

55 bvp->SetDimension (1, height); 

// 

// We now have two source blocks, so we need eight 
blocks for the half pel 
60 // prediction 

// 
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5 bvp->AddSourceBlock (sourcelBlockA = new 

BVPSourceBlock (sourcelBase, 

sourceStride, BVPDat a Format (BVPDT_U8 , 16), 0x10000)); 

bvp->AddSourceBlock (sourcelBlockB = new 
BVPSourceBlock (BVPPointer (sourcelBase , 1 + yuv), 
10 sourceStride, BVPDataFormat (BVPDT_U8 , 16), 0x10000)); 

bvp->AddSourceBlock ( sourcelBlockC = new 
BVPSourceBlock (BVPPointer (sourcelBase, sourceStride, 1, 0), 
sourceStride, BVPDataFormat (BVPDT_U8, 16) , 0x10000) ) ; 

bvp->AddSourceBlock (sourcelBlockD = new 
15 BVPSourceBlock (BVPPointer { sourcelBase, sourceStride, 1, 1 + yuv) , 
sourceStride, BVPDataFormat (BVPDT_U8 , 16), 0x10000)); 

bvp->AddSourceBlock (source2BlockA = new 
BVPSourceBlock (source2Base, 

sourceStride, BVPDataFormat (BVPDT_U8 , 16) , 0x10000) ) ; 

20 bvp->AddSourceBlock ( source2BlockB = new 

BVPSourceBlock (BVPPointer (source2Base, 1 + yuv), 
sourceStride, BVPDataFormat (BVPDT_U8 , 16), 0x10000)); 

bvp->AddSourceBlock ( source2BlockC = new 
BVPSourceBlock (BVPPointer ( source2Base, sourceStride, 1, 0), 

25 sourceStride, BVPDataFormat (BVPDT_U8, 16) , 0x10000) ) ; 

bvp->AddSourceBlock (source2BlockD = new 
BVPSourceBlock (BVPPointer (source2Base, sourceStride, 1, 1 + yuv), 
sourceStride, BVPDataFormat (BVPDTJJ8, 16) , 0x10000) ) ; 

30 if (delta) 

bvp~>AddSourceBlock (deltaBlock = new 
BVPSourceBlock (deltaBase, deltaStride, BVPDataFormat (BVPDT_S1 6, 16) , 
0x10000) ) ; 

35 bvp->AddTargetBlock (targetBlock = new 

BVPTargetBlock (targetBase, sourceStride, BVPDataFormat (BVPDT_U8 , 16) , 
0x10000) ) ; 

// 

40 // Build bidirectional prediction from two 

unidirectional predictions 
// 

/ / . --BuildBlockMerge (sourcelBlock*) 

// / 
45 // <— (AVG) 

// \ 

// '--BuildBlockMerge ( source2Block* ) 

// 

bvp->AddInst ruction 
50 ( 

postMC = 

new BVPDataOperation 

( 

BVPD0_AVG, 

55 BuildBlockMerge ( sourcelBlockA, sourcelBlockB, 

sourcelBlockC, sourcelBlockD, half lx, half ly) , 

BuildBlockMerge ( source2BlockA, source2BlockB , 
source2BlockC, source2BlockD, half2x, half2y) 

) 

60 ); 
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5 // 

// Apply error correction, see unidirectional case 

// 

if (delta) 
{ 

10 deltaData = new BVPDataLoad (deltaBlock) ; 

i f { yuv ) 
{ 

bvp->AddInst ruction 

15 ( 

postCorrect = 
new BVPDataConvert 
( 

BVPDTJJ8, 

20 new BVPDataOperation 

( 

BVPDO__ADD, 
new BVPDataConvert 
( 

25 BVPDT_S16, 

postMC 
), 

new BVPDataMerge 
( 

30 BVPDM_ODDEVEN, 

new BVPDataSplit 
( 

BVPDS_HEAD f 
deltaData 

35 ), 

new BVPDataSplit 

( 

BVPDS_TAIL, 
deltaData 

40 ) 



) 



45 } 

else 

{ 

bvp->AddInst ruction 
( 

50 postCorrect = 

new BVPDataConvert 
( 

BVPDT_U8, 

new BVPDataOperation 

55 ( 

BVPD0_ADD, 
new BVPDataConvert 
( 

BVPDT_S16, 

60 postMC 

)/ 
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10 



40 



deltaData 
) 

) 

) ; 



bvp-> Add Inst ruction 
( 

new BVPDataStore 
( 

15 targetBlock, 

postCorrect 
) 

) ; 

} 

20 else 

{ 

bvp->AddInst ruction 
( 

new BVPDataStore 

25 ( 

targetBlock, 

postMC 

) 

) ; 

30 } 

BVPXMMXCodeConverter conv; 

// 

35 // Translate routines 

// 



bicompensationBlock [yuv] [delta] [halfly] [halflx] [half2y] [half2x] 
code = conv . Convert (bvp) ; 



bicompensation[yuv] [delta] [halfly] [halflx] [half2y] [half2x] - 
(BiCompensationCodeType) (code->GetCodeAddress ( ) ) ; 

45 delete bvp; 

} 

} 

} 

} 

50 } 

} 

} 

MPEG2MotionCompensation: : ~MPEG2MotionCompensat ion (void) 
55 { 

int yuv, delta, halfy, halfx, halfly, halflx, half2y, half2x; 
// 

// free all motion compensation routines 
60 // 

for (yuv = 0; yuv<2; yuv++) 



18 



Attorney Docket No. RAVI0009 



5 { 

for(delta=0; delta<2; delta++) 

{ 

for (half y=0; halfy<2; halfy++) 
{ 

10 for (half x=0; halfx<2; halfx++) 

{ 

delete compensationBlock [yuv] [delta] [halfy] [halfx] ; 
} 

} 

15 } 

} 

for {yuv = 0; yuv<2; yuv++) 

{ 

for(delta=0; delta<2; delta++) 
20 { 

for (half ly=0; halfly<2; halfly++) 

{ 

for {half lx=0; halflx<2; halflx++) 
{ 

25 for (half2y=0; half2y<2; half2y++) 

{ 

for (half 2x=0; half2x<2; half2x++) 
{ 

delete 

30 bicompensationBlock[yuv] [delta] [halfly] [halflx] [half2y] [half2x] ; 

} 

} 

} 

} 

35 } 
} 

} 



TABLE C 



#ifndef BVPGENERIC_H 
#define BVPGENERI C_H 

45 #include "BVPList.h" 

// 

// Argument descriptor. An argument can be either a pointer or an 
integer used 

50 // as a stride, offset or width/height value. 

// 

class BVPArgument- 
{ 

public : 
55 bool pointer; 

int index; 
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BVPArgumerit (bool pointerj 

: pointer (pointer_) , index (0) {} 

}; 

10 // 

// Description of an integer value used as a stride or offset. An 
integer value 

// can be either an argument or a constant 

// 

15 class BVPInteger 
{ 

public : 

int value; 
BVPArgument * arg; 

20 

BVPInteger (void) 

: value (0) , arg (NULL) {} 
BVPInteger (int value_) 

: valuefvaluej; arg (NULL) {} 
25 BVPInteger (unsigned value_) 

: value ( (int) valuej , arg (NULL) {} 
BVPInteger (BVPArgument * arg_) 

: value (0), arg(arg_J {} 

30 bool operator= (BVPInteger i2) 

{ 

return arg ? (i2.arg == arg) : (i2. value == value); 
} 



35 



50 



}; 



// 

// Description of a memory pointer used as a base for source and 
target blocks. 

//A pointer can be a combination of an pointer base, a constant 
40 offset and 

// a variable index with scaling 

// 

class BVPPointer 
{ 

45 public: 

BVPArgument * base; 
BVPArgument * index; 
int offset; 
int scale; 



BVPPointer (BVPArgument * base_) 

: base (base ), index (NULL), offset (0) , scale (0) {; 



BVPPointer (BVPPointer base_, int offset_) 
55 : base (base__. base) , index (NULL), of f set (of f set_) , scale (0) {} 

BVPPointer (BVPPointer base_, BVPInteger index_, int scale_, int 
offset_) 

: base (base_.base) , index ( index_. arg) , offset (of f set_) , 
60 scale (scale_) {} 
}; 
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// 

// Base data formats for scalar types 
// 

enum BVPBaseDataFormat 
10 { 

BVPDT_U8, // Unsigned 8 bits 

BVPDTJJ16, // Unsigned 16 bits 

BVPDT_U32, // Unsigned 32 bits 

BVPDT_S8, // Signed 8 bits 
15 BVPDT_S16, // Signed 16 bits 

BVPDT_S32 // Signed 32 bits 

}; 

// 

20 // Data forma descriptor for scalar and vector (multimedia simd) 

types 

// Each data type is a combination of a base type and a vector size. 

//Scalar types are represented by a vector size of one. 

// 

25 class BVPDataFormat 
{ 

public: 

BVPBaseDataFormat format ; 
int num; 



30 



45 



50 



BVPDataFormat (BVPBaseDataFormat _format, int __num = 1) 
: f ormat ( format), num ( num) {} 



BVPDataFormat (void) 
35 : format (BVPDT_U8) , num(0) {} 

BVPDataFormat (BVPDataFormat & f) 

: format (f. format) , num (f. num) {} 

40 BVPDataFormat operator* (int times) 

{return BVPDataFormat ( format , num * times);} 

BVPDataFormat operator/ (int times) 

{return BVPDataFormat ( format , num / times);} 



int BitsPerElement (void) {static const int sz[] = {8, 16, 32, 8, 

16, 32}; return sz [format] ; } 

int BitsPerChunk (void) {return BitsPerElement ( ) * num;} 

}; 



// 

// Operation codes for binary data operations that have the 
// same operand type for both sources and the destination 
// 

55 enum BVPDataOperationCode 
{ 

BVPDO_ADD, // add with wraparound 

BVPDO_ADD_SATURATED, // add with saturation 

BVPDO SUB, // subtract with wraparound 

60 BVPDO~SUB_SATURATED, // subtract with saturation 

BVPDO MAX, // maximum 
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10 



15 



20 



BVPDOJ4IN, 

BVPDO_AVG, 

BVPDO_EQU, 

BVPDO_OR, 

BVPDO__XOR, 

BVPDO_AND, 

BVPDO_ANDNOT , 

BVPDO_MULL, 

BVPDO_MULH 

}; 

// 



// minimum 

// average {includes rounding towards nearest) 
// equal 
// binary or 

// binary exclusive or 

// binary and 

// binary and not 

// multiply keep lower half 

// multiply keep upper half 



// Operations that extract a part of a data element 



25 



// 

enum BVPDataSplitCode 
{ 

BVPDS_HEAD, 
BVPDS_TAIL, 
BVPDS__0DD, 
BVPDS_EVEN 

}; 



// extract first half 

// extract second half 

// extract odd elements 

// extract even elements 



30 



35 



40 



45 



// 

// Operations that combine to data elements 
// 



enum BVPDataMergeCode 
{ 

B VP DM__U PPERLOWER, 
BVPDM_ODDEVEN 

}; 

// 

// Node types in the data flow graph 
// 

enum BVPInstructionType 
{ 

BVPIT_LOAD, 
BVPIT__STORE, 
BVP INCONSTANT, 
BVPIT_SPLIT, 
BVPITJMERGE, 
BVPIT_CONVERT, 
BVPITJ3PERATION 

}; 



// chain first and second operands 
// interleave first and second operands 



// load an element from a source block 

// store an element into a source block 

// load a constant value 

// split an element 

// merge two elements 

// perform a data conversion 

// simple binary data operation 



50 // 

// Descriptor of a data block. Contains a base pointer, a 
stride (pitch) , a 

// format and an incrementor in vertical direction. The vertical 
block position 

55 // can be incremented by a fraction or a multiple of the given pitch. 

// 

class BVPBlock 
{ 

public : 

60 BVPPointer base; 

BVPInteger pitch; 



22 



Attorney Docket No. RAVI0009 



5 BVPDataFormat format; 

int yscale; 
int index; 

BVPBlock(BVPPointer >se, BVPInteger _pitch, BVPDataFormat 

10 __format, int _yscale) 

: base(_base), pitch (_pitch) , f ormat (_f ormat ) , yscale (_yscale) 

{} 

}; 

15 // 

// Descriptor of a source block 

// 

class BVPSourceBlock : public BVPBlock 

{ 

20 public: 

BVPSourceBlock (BVPPointer base, BVPInteger pitch, BVPDataFormat 

format, int yscale) 

: BVPBlock (base, pitch, format, yscale) {} 

}; 

25 

// 

// Descriptor of a target block 
// 

class BVPTargetBlock : public BVPBlock 

30 { 

public: 

BVPTargetBlock {BVPPointer base, BVPInteger pitch, BVPDataFormat 
format, int yscale) 

: BVPBlock (base, pitch, format, yscale) {} 

35 }; 

class BVPDataSource; 
class BVPDataDrain; 
class BVPDatalnstruction; 

40 

// 

// Source connection element of a node in the data flow graph. Each 
node in 

// the graph contains one or none source connection. A source 

45 connection is 

// the output of a node in the graph. Each source connection can be 

connected 

//to any number of drain connections in other nodes of the flow 
graph. The 

50 // source is the output side of a node. 

// 

class BVPDataSource 
{ 

public : 

55 BVPDataFormat format; 

BVPList<BVPDataDrain *> drain; 

BVPDataSource (BVPDataFormat _format) : f ormat (_f ormat ) {} 

60 virtual void Addlnstructions (BVPList<BVPDataInstruct ion *> & 

instructions) {} 
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virtual BVPDatalnstruction * Tolnstruction (void) {return NULL;} 



// 

// Drain connection element of a node in the data flow graph. Each 
10 node 

// can have none, one or two drain connections (but only one drain 
object 

//to represent both) . Each drain connects to exactly one source on 
the 

15 // target side. As eachnode can have only two inputs, each drain is 

connected 

// (through the node) with two sources. The drain is the input side 
of a 

// node. 
20 // 

class BVPDataDrain 

{ 

public : 

BVPDataSource * sourcel; 

25 BVPDataSource * source2; 

BVPDataDrain (BVPDataSource * sourcel_, BVPDataSource * source2_ = 

NULL) 

: sourcel (sourcel_) , source2 ( source2_) {} 



30 



virtual BVPDatalnstruction * Tolnstruction (void) {return NULL; } 

}; 



// 

35 // Each node in the graph represents one abstract instruction. It 

has an 

// instruction type that describes the operation of the node. 

// 

class BVPDatalnstruction 
40 { 

public : 

BVPInstructionType type; 
int index; 

45 BVPDatalnstruction (BVPInstructionType type_) 

: type(type_), index (-1) {} 

virtual -BVPDatalnstruction (void) {} 

50 virtual void Addlnstruct ions (BVPList<BVPDataInstruction *> & 

instructions ) ; 

virtual void GetOperationBits (int & minBits, int & maxBits) ; 

virtual BVPDataFormat Get InputFormat (void) = 0; 
55 virtual BVPDataFormat GetOutputFormat (void) - 0; 

virtual BVPDataSource * ToSource (void) {return NULL;} 
virtual BVPDataDrain * ToDrain (void) {return NULL; } 

}; 



60 



// 
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5 // Node that is a data source 

class BVPDataSourcelnstruction : public BVPDatalnstruction, public 
BVPDataSource 
{ 

10 public: 

BVPDataSourcelnstruction (BVPInstructionType type_, BVP Data Format 
f ormat_) 

: BVPDatalnstruction (typej , BVPDataSource (forma tj {} 

15 void GetOperationBits (int & minBits, int & maxBits) ; 

BVPDataFormat GetOutputFormat (void) {return format;} 
BVPDataFormat Getlnput Format (void) {return format;} 

20 BVPDatalnstruction * Tolnstruction ( void) {return this;} 

BVPDataSource * ToSource (void) {return this;} 

}; 

25 // Node that is a data source and has one or two sources connected to 

its drain 

class BVPDataSourceDrainlnstruction : public BVPDataSourcelnstruction, 
public BVPDataDrain 
30 { 

public: 

BVPDataSourceDrainlnstruction (BVPInstructionType type_, 
BVPDataFormat forrrtat_, BVPDataSource * sourcel_) 

: BVPDataSourcelnstruction (type_, format_) , 
35 BVPDataDrain (sourcel_) 

{sourcel->drain. Insert (this) ; } 
BVPDataSourceDrainlnstruction (BVPInstructionType type_, 
BVPDataFormat format^, BVPDataSource * sourcel_, BVPDataSource * 
source2_) 

40 : BVPDataSourcelnstruction (type_, f ormat_J , 

BVPDataDrain ( sourcel__, source2_) 

{sourcel->drain. Insert (this) ; source2->drain . Insert ( this ) ; } 

}; 

45 // 

// Instruction to load data from a source block 
// 

class BVPDataLoad : public BVPDataSourcelnstruction 
{ 

50 public: 

BVPSourceBlock * block; 
int offset; 

BVPDataLoad (BVPSourceBlock * block_, int offset_ = 0) 
55 : BVPDataSourcelnstruction (BVPIT_LOAD, block_->f ormat ) , 

block (block_) , of f set (of f set_) {} 

void Addlnstructions (BVPList<BVPDataInstruction *> & instructions) 

}; 



60 



// 
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5 // Instruction to store data into a target block 

// 

class BVPDataStore : public BVPDatalnstruction, public BVPDataDrain 
{ 

public : 

10 BVPTargetBlock * block; 

BVPDataStore (BVPTargetBlock * block_, BVPDataSource * source) 
: BVPDataInstruction{BVPIT_STORE) , BVPDataDrain (source) , 
block (block_) 
15 {source->drain. Insert (this) ; } 

void Addlnstructions (BVPList<BVPDataInstruction *> & instructions); 

BVPDataFormat GetOutputFormat (void) {return sourcel->f ormat ; } 
20 BVPDataFormat Get Input Format (void) {return sourcel->f ormat ; } 

BVPDatalnstruction * Tolnstruction (void) {return this;} 
BVPDataDrain * ToDrain ( void) {return this;} 



25 



}; 



// 

// Instruction to load a constant 
// 

class BVPDataConstant : public BVPDataSourcelnstruct ion 
30 { 

public : 

int value; 

BVPDataConstant (BVPDataFormat format, int value_) 
35 : BVPDataSourceInstruction(BVPIT_CONSTANT, format), 

value (value_) {} 
}; 

// 

40 // Instruction to split a data element 

// 

class BVPDataSplit : public BVPDataSourceDrainlnstruction 

{ 

public : 

45 BVPDataSplitCode code; 



BVPDataSplit (BVPDataSplitCode code_, BVPDataSource * source) 

: BVPDataSourceDrainlnstruction (BVPIT_SPLIT, source->f ormat / 2, 
source), code(code_) {} 

50 

void Addlnstructions (BVPList<BVPDataInstruction *> & instructions); 
BVPDataDrain * ToDrain (void) {return this;} 
55 BVPDataFormat Getlnput Format (void) {return sourcel->f ormat ; } 

}; 

// 

// Instruction to merge two data elements 
60 // 

class BVPDataMerge : public BVPDataSourceDrainlnstruction 
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5 { 

public: 

BVPDataMergeCode code; 

BVPDataMerge (BVPDataMergeCode code_, BVPDataSource * sourcel_, 
10 BVPDataSource * source2_J 

: BVPDataSourceDrainlnstruction (BVPIT_MERGE, sourcel_->f ormat * 
2, sourcel_, source2_) , 
code(code_) {} 

15 void Addlnstructions (BVPList<BVPDataInstruction *> & instructions) 

BVPDataDrain * ToDrain ( void) {return this;} 

BVPDataFormat Get Input Format { void) {return sourcel->f ormat ; } 

20 }; 

// 

// Instruction to convert the basic vector elements of an data 
element into 

25 // a different format (eg. from signed 16 bit to unsigned 8 bits) . 

// 

class BVPDataConvert : public BVPDataSourceDrainlnstruction 
{ 

public: 

30 BVPDataConvert (BVPBaseDataFormat target, BVPDataSource * source) 

: BVPDataSourceDrainlnstruction (BVPIT_CONVERT , 
BVPDataFormat (target , source->f ormat . num) , source) {} 

void Addlnstructions (BVPList<BVPDataInstruction *> & instructions) 

35 

BVPDataDrain * ToDrain (void) {return this;} 

BVPDataFormat Get InputFormat (void) {return sourcel~>f ormat ; } 

}; 

40 

// 

// Basic data manipulation operation from two sources to one drain. 

// 

class BVPDataOperat ion : public BVPDataSourceDrainlnstruction 
45 { 

public : 

BVPDataOperat ionCode code; 

BVPDataOperation (BVPDataOperationCode code_, BVPDataSource * 
50 sourcel_, BVPDataSource * source2_J 

: BVPDataSourceDrainlnstruction (BVPIT_OPERATION, sourcel_- 
>f ormat, sourcel_, source2__) , code(code_) {} 

void Addlnstructions (BVPList<BVPDataInstruction *> & instructions) 

55 

BVPDataDrain * ToDrain (void) {return this;} 

}; 

// 

60 // Descriptor for one image block processing routine. It contains 

the arguments , the 
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5 // size and the dataflow graph. On destruction of the block 

processor all argument, 

// blocks and instructions are also deleted. 
// 

class BVPBlockProcessor 
10 { 

public: 

BVPInteger width; 
BVPInteger height ; 

15 BVPList<BVPBlock *> blocks; 

BVPList<BVPDataInstruction *> instructions; 
BVPList<BVPArgument *> args; 

BVPBlockProcessor (void) 
20 { 
} 

-BVPBlockProcessor (void) ; 
25 // 

// Add an argument to the list of arguments. Please note that 
the arguments 

// are added in the reverse order of the c-calling convention. 

// 

30 void AddArgument (BVPArgument * arg) 

{ 

arg->index = args.Num{); 

args . Insert (arg) ; 

} 

35 

// 

// Set the dimension of the operation rectangle. The width and 
height can 

// either be constants or arguments to the routine. 

40 // 

void SetDimension (BVPInteger width, BVPInteger height) 

{ 

this->width = width; 
this->height = height; 
45 } 

// 

// Add a source block to the processing 
// 

50 void AddSourceBlock(BVPSourceBlock * block) 

{ 

block->index = blocks . Num () ; 

blocks . Insert (block) ; 

} 

55 

// 

// Add a target block to the processing 
// 

void AddTargetBlock(BVPTargetBlock * block) 

60 { 

block->index = blocks .Num () ; 
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5 blocks . Insert (block) ; 

} 

// 

// Add an instruction to the dataflow graph. All referenced 
10 instructions 

// will also be added to the graph if they are not yet part of 

it. 

// 

void Addlnstruction (BVPDatalnstruction * ins) 
15 { 

ins->AddInstructions (instructions) ; 
} 

void GetOperationBits (int & minBits, int & maxBits); 

20 }; 



tendif 



Although the invention is described herein with reference to the preferred 
embodiment, one skilled in the art will readily appreciate that other applications 
may be substituted for those set forth herein without departing from the spirit and 
scope of the present invention. Accordingly, the invention should only be limited 
30 by the claims included below. 
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Claims 

1 . An apparatus for generating computer assembly code, comprising: 
an abstract routine generator for receiving a data stream comprising a 

10 multimedia routine and for outputting a generic abstract representation thereof; 
and 

a translator for said abstract routine generator for receiving said abstract 
representation and for outputting processor specific code for processing 
multimedia input data. 

15 

2. The apparatus of Claim 1 , where in said abstract routine generator builds an 
abstract routine during runtime. 

3. The apparatus of Claim 1, wherein said abstract routine generator builds an 
20 abstract routine in the form of a graph. 

4. The apparatus of Claim 1 wherein said multimedia data comprise SIMD input 
data. 

25 5. The apparatus of Claim 1, wherein said multimedia data comprise image input 
data. 



30 



Attorney Docket No. RAVI0009 

5 6. The apparatus of Claim 1 , wherein said multimedia data comprise audio input 
data. 

7. The apparatus of Claim 3, wherein said graph is input to said translator. 

10 8. The apparatus of Claim 3, wherein the output of said translator is in assembly 
code. 

9. The apparatus of Claim 1, wherein said translator's configuration can be 
changed by use of a dynamic library link. 

15 

10. The apparatus of Claim 1 , wherein said processor-specific code 
performs any of the operations of add, sub, multiply, average, maximum, 
minimum, compare, and, or, xor, pack, unpack, and merge on said input data. 

20 11. The apparatus of Claim 3, wherein said graph is a function of any of source 
block, target block, change in the block, color, stride, change in stride, display 
block, and spatial filtering. 

12. A method for generating assembly code, comprising: 
25 providing an abstract routine generator for generating a generic abstract 

representation of an input stream, and input comprising multimedia a routine; and 
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5 providing a translator for receiving said abstract representation from said 

abstract routine generator and for outputting processor-specific code for 
processing multimedia input data. 

1 3. The method of Claim 1 2, wherein said abstract routine generator builds the 
10 abstract routine during runtime. 

14. The method of Claim 13, wherein said abstract routine is a graph. 

15. The method of Claim 12, wherein said multimedia input data comprise 
is SIMD data. 

16. The method of Claim 12, said multimedia input data comprise image data. 

17. The method of Claim 12, wherein said multimedia input data comprise 
20 audio data. 

18. The method of claim 14, wherein said graph is input to said translator. 

19. The method of claim 12, wherein the output of said translator is assembly 
25 code. 
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5 20. The method of Claim 12, wherein said processor-specific code performs 
any of the operations of add, sub, multiply, average, maximum, minimum, 
compare, and, or, xor, pack, unpack, and merge on said multimedia input data. 

21 . The method of Claim 14, wherein said graph is a function of any of source 
10 block, target block, change in the block, color, stride, change in stride, display 

block, and spatial filtering. 

22. The method of Claim 1 2, wherein said translator can be changed by use of 
a dynamic library link. 

15 



33 



Attorney Docket No. RAVI0009 

Abstract 

A method and apparatus for processing multimedia instruction enhanced 
data by the use of an abstract routine generator and a translator. The abstract 
routine generator takes the multimedia instruction enhanced data and generates 
abstract routines to compile the multimedia instruction enhanced data. The 
output of the abstract generator is an abstract representation of the multimedia 
instruction enhanced data. The translator then takes the abstract representation 
and produces code for processing. 
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